Nonnegative Tensor Factorization with Frequency Modulation Cues for Blind Audio Source Separation
نویسندگان
چکیده
We present Vibrato Nonnegative Tensor Factorization, an algorithm for single-channel unsupervised audio source separation with an application to separating instrumental or vocal sources with nonstationary pitch from music recordings. Our approach extends Nonnegative Matrix Factorization for audio modeling by including local estimates of frequency modulation as cues in the separation. This permits the modeling and unsupervised separation of vibrato or glissando musical sources, which is not possible with the basic matrix factorization formulation. The algorithm factorizes a sparse nonnegative tensor comprising the audio spectrogram and local frequencyslope-to-frequency ratios, which are estimated at each time-frequency bin using the Distributed Derivative Method. The use of local frequency modulations as separation cues is motivated by the principle of common fate partial grouping from Auditory Scene Analysis, which hypothesizes that each latent source in a mixture is characterized perceptually by coherent frequency and amplitude modulations shared by its component partials. We derive multiplicative factor updates by MinorizationMaximization, which guarantees convergence to a local optimum by iteration. We then compare our method to the baseline on two separation tasks: one considers synthetic vibrato notes, while the other considers vibrato string instrument recordings.
منابع مشابه
Nonnegative Tensor Factorization for Directional Blind Audio Source Separation
We augment the nonnegative matrix factorization method for audio source separation with cues about directionality of sound propagation. This improves separation quality greatly and removes the need for training data, but doubles the computation.
متن کاملNotes on Nonnegative Tensor Factorization of the Spectrogram for Audio Source Separation: Statistical Insights and Towards Self-Clustering of the Spatial Cues
Nonnegative tensor factorization (NTF) of multichannel spectrograms under PARAFAC structure has recently been proposed by Fitzgerald et al as a mean of performing blind source separation (BSS) of multichannel audio data. In this paper we investigate the statistical source models implied by this approach. We show that it implicitly assumes a nonpoint-source model contrasting with usual BSS assum...
متن کاملNonnegative Tensor Factorization, Completely Positive Tensors, and a Hierarchical Elimination Algorithm
Nonnegative tensor factorization has applications in statistics, computer vision, exploratory multiway data analysis and blind source separation. A symmetric nonnegative tensor, which has an exact symmetric nonnegative factorization, is called a completely positive tensor. This concept extends the concept of completely positive matrices. A classical result in the theory of completely positive m...
متن کاملFast Local Algorithms for Large Scale Nonnegative Matrix and Tensor Factorizations
Nonnegative matrix factorization (NMF) and its extensions such as Nonnegative Tensor Factorization (NTF) have become prominent techniques for blind sources separation (BSS), analysis of image databases, data mining and other information retrieval and clustering applications. In this paper we propose a family of efficient algorithms for NMF/NTF, as well as sparse nonnegative coding and represent...
متن کاملMultichannel nonnegative matrix factorization in convolutive mixtures for audio source separation Factorisation en matrices à coefficients positifs de données multicanal convolutives pour la séparation de sources audio
We consider inference in a general data-driven object-based model of multichannel audio data, assumed generated as a possibly underdetermined convolutive mixture of source signals. We work in the Short-Time Fourier Transform (STFT) domain, where convolution is routinely approximated as linear instantaneous mixing in each frequency band. Each source STFT is given a model inspired from nonnegativ...
متن کامل